DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 30

1
Shapley Idioms: Analysing BERT Sentence Embeddings for General Idiom Token Identification
In: Front Artif Intell (2022)
BASE
Show details
2
Semantic Relatedness and Taxonomic Word Embeddings ...
BASE
Show details
3
English WordNet Taxonomic Random Walk Pseudo-Corpora
In: Conference papers (2020)
BASE
Show details
4
Language related issues for machine translation between closely related south Slavic languages
Arcan, Mihael; Klubicka, Filip; Popovic, Maja. - : The COLING 2016 Organizing Committee, 2019
BASE
Show details
5
Synthetic, Yet Natural: Properties of WordNet Random Walk Corpora and the impact of rare words on embedding performance
In: Conference papers (2019)
BASE
Show details
6
Size Matters: The Impact of Training Size in Taxonomically-Enriched Word Embeddings
In: Articles (2019)
Abstract: Word embeddings trained on natural corpora (e.g., newspaper collections, Wikipedia or the Web) excel in capturing thematic similarity (“topical relatedness”) on word pairs such as ‘coffee’ and ‘cup’ or ’bus’ and ‘road’. However, they are less successful on pairs showing taxonomic similarity, like ‘cup’ and ‘mug’ (near synonyms) or ‘bus’ and ‘train’ (types of public transport). Moreover, purely taxonomy-based embeddings (e.g. those trained on a random-walk of WordNet’s structure) outperform natural-corpus embeddings in taxonomic similarity but underperform them in thematic similarity. Previous work suggests that performance gains in both types of similarity can be achieved by enriching natural-corpus embeddings with taxonomic information from taxonomies like WordNet. This taxonomic enrichment can be done by combining natural-corpus embeddings with taxonomic embeddings (e.g. those trained on a random-walk of WordNet’s structure). This paper conducts a deep analysis of this assumption and shows that both the size of the natural corpus and of the random-walk coverage of the WordNet structure play a crucial role in the performance of combined (enriched) vectors in both similarity tasks. Specifically, we show that embeddings trained on medium-sized natural corpora benefit the most from taxonomic enrichment whilst embeddings trained on large natural corpora only benefit from this enrichment when evaluated on taxonomic similarity tasks. The implication of this is that care has to be taken in controlling the size of the natural corpus and the size of the random-walk used to train vectors. In addition, we find that, whilst the WordNet structure is finite and it is possible to fully traverse it in a single pass, the repetition of well-connected WordNet concepts in extended random-walks effectively reinforces taxonomic relations in the learned embeddings.
Keyword: Computational Engineering; Computational Linguistics; retrofitting; semantic similarity; taxonomic embeddings; taxonomic enrichment; word embeddings; WordNet
URL: https://arrow.tudublin.ie/cgi/viewcontent.cgi?article=1090&context=scschcomart
https://arrow.tudublin.ie/scschcomart/83
BASE
Hide details
7
Training corpus hr500k 1.0
Ljubešić, Nikola; Agić, Željko; Klubička, Filip. - : Jožef Stefan Institute, 2018
BASE
Show details
8
Quantitative Fine-Grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian ...
BASE
Show details
9
Is it worth it? Budget-related evaluation metrics for model selection ...
BASE
Show details
10
Quantitative Fine-grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian
In: Articles (2018)
BASE
Show details
11
Is it worth it? Budget-related evaluation metrics for model selection
In: Conference papers (2018)
BASE
Show details
12
hr500k – A Reference Training Corpus of Croatian.
In: Conference papers (2018)
BASE
Show details
13
Croatian Twitter training corpus ReLDI-NormTag-hr 1.1
Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
BASE
Show details
14
Serbian Twitter training corpus ReLDI-NormTag-sr 1.0
Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
BASE
Show details
15
Croatian Twitter training corpus ReLDI-NormTag-hr 1.0
Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
BASE
Show details
16
Serbian Twitter training corpus ReLDI-NormTag-sr 1.1
Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
BASE
Show details
17
Fine-grained human evaluation of neural versus phrase-based machine translation ...
BASE
Show details
18
Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation
In: Prague Bulletin of Mathematical Linguistics , Vol 108, Iss 1, Pp 121-132 (2017) (2017)
BASE
Show details
19
Serbian-English parallel corpus srenWaC 1.0
Ljubešić, Nikola; Esplà-Gomis, Miquel; Ortiz Rojas, Sergio. - : Jožef Stefan Institute, 2016
BASE
Show details
20
Finnish-English parallel corpus fienWaC 1.0
Ljubešić, Nikola; Esplà-Gomis, Miquel; Ortiz Rojas, Sergio. - : Jožef Stefan Institute, 2016
BASE
Show details

Page: 1 2

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
30
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern